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INFORMATION-THEORETIC  NON-PARAMETRIC 
UNIMOOAL  DENSITY  ESTIMATION 

by 

P.  Brockett,  A.  Charnes,  K.  Paick 


Abstract 


-We  present^an  information-theoretic  method  for  nonparametric  density 
estimation  which  guarantees  that  the  resulting  density  is  unimodal .  The 
method  inputs  data  in  the  form  of  moment  or  quantile  information  and  con¬ 
sequently  can  handle  both  data  derived  and  non-data  derived  information. 
In  the  non-data  derived  situation  it  yields  a  method  for  obtaining  uni¬ 
modal  Bayesian  prior  distribution. 
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I .  INTRODUCTION 


In  many  problems  encountered  in  engineering  and  signal  processing, 
it  is  useful  to  estimate  the  probability  density  function  corresponding 
to  some  random  phenomenon  under  study.  Often  it  is  known  that  the 
density  is  unimodal,  and  the  first  few  cumulants  can  effectively  be 
estimated  from  the  available  data.  However,  the  precise  parametric 
formula  for  the  generating  density  cannot  be  determined  from  physical 
considerations  alone  and  often  might  not  match  any  of  the  commonly 

assumed  densities,  even  if  a  parametric  model  was  appropriate. 
Accordingly,  the  usual  technique  in  applied  research  is  to  exercise  a 
procedure  of  nonparametrlc  density  estimation  and  then  numerically  obtain 
the  estimated  density.  There  is  usually  no  closed  form  solution  to  such 
procedures  (e.g.,  window  estimates  or  kernel  estimates),  although 
numerical  computations  such  as  likelihood  ratio  detection  algorithms  can 
be  implemented  using  these  techniques.  Unfortunately,  the  resulting 
nonparametrlc  density  estimate  obtained  by  smoothing  the  empirical 
histogram  usually  does  not  exhibit  unimodality  even  when  it  is  known  that 
the  true  density  is  unimodal.  The  tail  behavior  of  these  density 

estimates  may  not  be  monotonic.  It  is,  however,  exactly  these  tails 

which  are  frequently  the  major  object  of  Interest. 

In  this  paper  we  present  a  new  method  for  unimodal  nonparametric 
density  estimation  which  is  not  based  upon  smoothing  the  empirical 
histogram  per  se.  Instead,  it  uses  a  method  of  Kemperman’s  [  1]  for 

transforming  moment  problems,  and  couples  this  with  an  information- 
theoretic  generalization  of  Laplace's  famous  "principle  of  insufficient 
reason"  to  obtain  a  unimodal  density  estimate.  The  resulting  density 


estimate  is  rendered  in  closed  analytic  form  and  can  easily  be  worked 
with  numerically,  e.g.,  for  optimal  (likelihood  ratio  or  Neyman-Pearson) 
signal  detection  in  noise. 

It  should  be  remarked  that  our  method  is  also  applicable  to  the 
situation  of  prior  information  which  is  not  necessarily  data  derived  and 
can  be  used  for  assessing  a  unimodal  prior  distribution  for  subsequent 
Bayesian  analysis.  This  topic  will  be  pursued  in  a  separate  paper  [2], 

In  section  II  we  present  the  information-theoretic  estimation 
procedure  that  we  have  employed.  In  section  III  we  present  the  method 
used  to  transform  the  problem  of  unimodal  density  estimation  to  an 
estimation  problem  involving  an  auxiliary  variable.  In  section  IV  we 
present  the  actual  nonparametric  estimation  procedure  which  ensures  a 
unimodal  density  estimate  and  which  exhibits  certain  pertinent 
characteristics  desired.  In  the  final  section,  we  give  some  numerical 
results  using  both  real  and  simulated  data. 

II.  MAXIMUM  ENTROPY  AND  MDI  DENSITY  ESTIMATION 

The  concept  of  statistical  information  and  density  estimation  for 
numerical  data  is  of  paramount  importance  in  statistics,  economics, 
engineering,  signal  detection,  and  other  fields.  Wiener  [3]  remarked 
quite  early,  in  1948,  that  the  Shannon  measure  of  information  from 
statistical  communication  theory  could  eventually  replace  Fisher's 
concept  of  statistical  information  for  a  sample.  For  example,  using  a 
measure  of  informational  distance  between  two  measures  first  developed  by 
Kullback  and  leibler  [4]  in  1951,  following  the  work  of  Khinchin,  it  has 
been  shown  how  to  estimate  the  order  in  an  autoregressive  time  series 
model,  how  to  estimate  the  number  of  factors  in  a  factor  analysis  model, 
and  how  to  analyze  contingency  tables  (cf.  Akaike  [5,6]  and  Gokhale  and 
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Kullback  [  7]).  Minimizing  this  statistical  "distance"  subject  to  the 
given  constraints  is  called  "Khinchin-Kullback-Leibler"  (K^L)  estimation, 
or  "minimum  discrimination  information"  (MDI)  estimation  in  the 
literature.  Mathematically  the  problem  is  to  pick  that  density  function 
f  which  is  "as  close  as  possible"  to  some  given  function  g,  and  which 
satisfies  certain  given  moment  constraints,  e.g., 

■In  /f(x)  m 

subject  to 

/•0(x) f(x)X(dx)  •  1  -  e0 
/•i(*)f (x)X(dx)  -  0, 

1  1  (2.1) 


•  • 

/•k(x)f(x)X(dx)  -  0^  # 

Here  \  is  some  dominating  measure  for  f  and  g  (usually  Lebesgue  measure 
in  the  continuous  case,  or  counting  measure  in  the  discrete  case), 
®1» •  •  • are  the  given  moment  values  for  the  moment  functions 
and  «0(x)sl.  The  moment  functions  a^(x)  may  be  used  to  generate  moment 
or  cumulant  constraints,  e.g.,  when  a^  is  a  polynomial,  or  may  generate 
centile  constraints,  e.g.,  when  aj  is  an  indicator  function  for  a 
terminal  interval. 

In  many  applications  there  is  no  a  priori  choice  of  a  distribution  g 
to  serve  as  a  goal  density  in  (2.1).  In  this  case  we  express  our 
ignorance  by  choosing  all  x  values  to  be  equally  likely,  i.e.,  g(x)sl. 
In  this  case  the  MDI  or  K^l  objective  functional  is  of  the  form 

/f(x)tnf (x)  X(dx)  -  -/f(x)tn  X<dx) 
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This  is  precisely  minus  the  entropy  of  the  density,  and  the  MDI  problem 
becomes  a  maximum  entropy  (ME)  problem.  The  ME  criterion  can  be  thought 
of  as  taking  the  most  "uncertain"  distribution  possible  subject  to  the 
given  constraints.  Accordingly,  this  principle  of  maximum  entropy  may  be 
construed  as  a  new  and  important  extension  of  the  famous  Laplace 
“principle  of  insufficient  reason,"  which  postulates  a  uniform 
distribution  in  situations  in  which  nothing  is  known  about  the  variable 
in  question.  (See  Guiasu  [8]  for  more  motivation  and  explanation  of 
these  results.)  Here  the  ME  distribution  is  as  close  to  uniform  as 
possible,  subject  to  the  given  informational  constraints. 

The  minimization  of  (2.1)  subject  to  the  given  constraints  is  easily 
carried  out  by  Lagrange  multipliers.  The  short  derivation  given  below 
can  essentially  be  found  in  Guiasu  [8]. 

Introducing  a  Lagrange  multiplier  for  each  constraint  in  (2.1)  and, 
changing  from  a  minimization  to  a  maximization,  we  wish  to  maximize 

L  -  /  f(x)  in  X(dx)  -  cfc[e0-Ja0(x)fCx)X(dx>] 

+  ...  +  ak[0k-/,k(x)f(x)x(<lx)] 


or,  equivalently. 


,  [8U)"t'{lV1‘1<*)}l 
”1  TO 


■«xax(x)|x.(dx) 


-  /  f(x) 


/  f<x)  | 


|'g(x)exp{-E^a1.1(x)j- 
tlx) 


A(dx) 


**ll  X(dx) 


The  Inequality  follows  since  mx^x-1  with  equality  only  at  x=l.  Thus, 
the  Inequality  becomes  an  equality  when 

*<*)  ■  g(x)«xp  • 

and  this  becomes  the  maximizing  density.  We  shall  call  (2.2)  the  MDI 
density  (or  the  ME  density  if  g(x)sl)  subject  to  the  constraints. 

The  numerical  value  of  the  constants  are  found  using  the  moment 
constraints.  In  Brockett,  Charnes,  and  Cooper [  9]  or  Charnes,  Cooper,  and 
Selford  [10],  It  Is  shown  how  to  obtain  the  constants  <*i  as  dual 
variables  In  an  unconstrained  convex  programming  problem.  We  shall 
discuss  this  fact  further  In  the  section  concerning  numerical 
computation,  since  In  the  dual  formulation  of  the  MDI  problem  the 
computation  Is  easily  accomplished  using  any  of  a  number  of  existing 
nonlinear  programming  codes. 

III.  TRANSFORMING  THE  UNIMODAL  DENSITY  ESTIMATION  PROBLEM 

In  this  section  we  will  show  how  to  use  the  Information  that  a 
density  Is  unlmodal  in  the  nonparametrlc  estimation  problem.  The 

technique  to  be  used  is  borrowed  from  Professor  J.  H.  B.  Kemperman, 
whose  1971  paper  [  l]  also  gives  more  advanced  and  wide  ranging  moment 
transformation  techniques. 

A  famous  characterization  of  "zero"  unlmodal  random  variables  (due 
to  L.  Shepp  following  the  work  of  Khinchin)  is  the  following.  Suppose  Y 
is  unlmodal  with  mode  zero.  Then  YaU*X  where  U  and  X  are  independent, 
and  U  Is  uniformly  distributed  over  [0,1].  A  proof  of  this  result  can  be 
found,  for  example.  In  Feller  (Ref.  11,  p.  158).  From  the  above  result. 
It  follows  Immediately  that 


Eh(Y)  *  Eh*(X) 


where 

h*(x)  *  E[h(UX)|X=x]  *  J  f  h(t)dt  . 

Our  technique  for  solving  the  problem  (2.1)  in  the  unimodal  case  may 
now  be  explained  as  follows.  If  Y  is  unimodal  with  mode  m,  then  Y-m  is 
zero  *  unimodal.  First  transform  the  given  moment  constraints 
Ja.(x)fY(x)dA(x)=ei  on  the  original  variable  Y  to  constraints  of  the  form 
/**<  x)fx(x)dX(x)=0.  on  an  auxiliary  variable  X,  where 

X 

ai(x)  mjf  . 

o 

If  the  mode  is  unknown,  then  a  consistent  estimator  fh  may  be  used. 
(See  Sager  C 123  for  such  a  nonparametric  mode  estimator.)  We  then  solve 
the  transformed  MDI  problem  involving  the  constrained  estimation  of  fy. 
Using  the  estimated  X  density  we  then  transform  back  to  obtain  the 
estimate  for  Y.  if  X  is  estimated  by  £,  then  Y  is  estimated  by  ®+u«x, 
and  consequently  is  unimodal  by  Khinchin's  theorem.  The  details  are 
given  in  the  next  section. 

IV.  OBTAINING  THE  ESTIMATED  DENSITY 
Decomposing  the  original  variable  Y  via  the  Khinchin  representation 
Y-m*U*X,  we  can  now  transform  the  constraint  set  in  (2.1)  into 
constraints  involving  X.  Namely,  by  Kemperman's  technique, 

0i  *  E [ai ( Y)]  *  E[ai(Y-m+m)]  =  E[a*(X)]  , 

where 

•*(*)  "  zf  -  i  J  *i(t)dt 

o  ■ 

As  an  illustration,  if  an  original  constraint  on  Y  is  a  raw  moment,  say 

Vx)  -  xk  , 
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then  it  follows  that 


•I<x)  -  [(x+«)k+1-«k:f  1  ^(k+l)x 

■  A(kr)'*-vw 

Thus  the  corresponding  constraint  in  the  auxiliary  variable  X  involves  a 
sum  of  moments  up  to  order  k.  Similarly,  if  the  constraint  upon  Y  is  a 
probability  or  centile  constraint  we  obtain,  for  example, 

*  -  PlY>0  -  e[II5<YJ]  -  -  E[«*<*>  ]. 

where 

*‘<,)  ‘  j  *  °-  *«- 

l  -  x^rm 

Having  transformed  the  moment  constraints  on  Y  into  constraints  on 
X,  we  then  estimate  the  density  for  X  via  maximum  entropy,  viz.,  fx  is  the 
solution  to 

max  fx(x)dx(x) 

subject  to 

•o  -  1  -  ffX(x)\(dx) 

®1  *  /  x<d*> 

•  • 

•  e 

•  • 


Using  the  sample  data  we  are  able  to  estimate  the  parameters 
®l,.»»»®kfor  the  desired  moment  functions  a^xj.agtx),...  .a^x)  by 

*  *  A 

'According  to  formula  (2.2)  the  estimate  for  the  density  of 

X  is 


The  numerical  value  for  the  constants  can  be  determined  using  the  dual 
convex  programming  formulation  outlined  in  Brockett,  Charnes,  and  Cooper 
[9]. 

An  additional  advantage  of  the  ME  procedure  described  above  is  that 
since  (4.2)  defines  a  member  of  the  exponential  family  of  distributions, 
the  usual  results  concerning  statistical  properties  (such  as  the 
existence  of  sufficient  statistics)  are  valid.  Under  mild  conditions  on 
the  estimator  e  of  0,  it  also  follows  that  the  parameter  estimates  are 
asymptotically  normally  distributed  with  derivable  covariance  matrix  (cf. 
Kullback  [13]).  This  allows  for  confidence  interval  statements  to  be 
made  concerning  f^. 

Now,  since  f^  is  estimated  via  (4.2),  we  may  transform  back  to  obtain 
the  distribution  for  Y.  Using  the  relationship  Y-m=U*X  we  have  the 
estimated  density  function  for  Y: 


gn(y-m) 


f„(x>  ±L 


(4.3) 


So 


f’<r>  *  'Crri 
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ry-m  f  *  . 

J  -»  exP  {■Zotia1(x)}  ^x  if  y*m 

(4.4) 

r  exp  if 

J  y-m  1  J  x 

The  unimodality  of  this  density  estimate  is  apparent  since 

-yfY(y+m)=fx<y)>°* 

In  the  next  section  we  give  some  examples  using  the  above  techniques. 

V.  NUMERICAL  RESULTS 

In  this  section  we  present  several  examples  of  the  various  types  of 
constraints  (moment  functions)  and  the  resulting  graphical  display  of  the 
estimated  density.  In  the  simulations  we  shall  also  display  the  "true" 
density  from  which  the  data  were  obtained.  Estimating  the  density  for  X 
via  minimum  MDI  (1.2)  can  easily  be  done  by  the  duality  theory  given  in 
Charnes,  Cooper,  and  Seiford  DO- 

The  unimodal  estimation  parameters  {<*i}  in  the  density  fx(x)  are 
dual  variables  in  the  unconstrained  convex  programming  problem  (5.1),, with 
a*(x)  replacing  a^(x)  and  x(dx)=dx  in  (2.1).  According  to  the  duality 
result  of  Charnes,  Cooper,  and  Seiford  [10].  the  problem  (2.1)  has  an 
unconstrained  dual  problem. 

k  c  k 

max  £  «iei  -  I  g(x)  exp  -  £ 

i=0  J  i=0 

If  the  density  is  to  be  estimated  over  the  entire  interval  (-®,~), 
then  some  algebraic  moment  constraint  must  be  included  as  one  of  the 
constraining  equations.  For  the  numerical  examples  given  in  this  section 
we  have  used  the  expected  value  of  the  corresponding  moment  constraint  on 
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the  auxiliary  variable  X  which  is  EVfyi=|jVUlfx(x)dx. 

A  further  result  which  should  be  noted  is  that  if  one  wishes  to 
impose  a  continuity  constraint  upon  the  estimated  density  fy  at  the  mode 
m,  then  this  continuity  constraint  of  Y  translates  into  another  moment 
constraint  upon  the  auxiliary  variable  X  of  the  form 

0  "  /  ak+*(x>  fx<*>dx 

where  a,x+1(x)-l/x  (there  is  no  corresponding  a^+^x)  constraint  on  Y, 
but  from  (4.3)  we  see  the  constraint  (5.2)  amounts  to  fy(tt-o)-fY(o+o) ) . 
Three  different  goal  densities  were  chosen  for  illustration: 


g^x)  -  1 


g2<*) 


exp  £«-( | x | -6)2- ^jij-  -  for  |x|*6 


for  |x|»6 


*3(x)  -  — - — -  exp 

/5x  <? 


The  goal  density  gj(x)«i  corresponds  to  maximum  entropy  estimation  for 
fx.  To  impose  smoothness  on  the  estimated  density  fy  near  the  mode  m, 
the  goal  density  gg(x)  was  used.  This  goal  density  behaves  like  the 
constant  1  outside  (x|<g,  and  dips  smoothly  to  zero  as  |x|*0,  i.e.  the 
goal  density  approximates  the  ME  procedure  given  by  g^(x),  but  constrains 
the  estimated  density  fy  to  be  smooth  around  the  mode.  The  goal  density 
93U)  corresponds  to  the  fx  density  which  would  result  from  fy  being 
normally  distributed.  Hence  this  goal  density  gives  the  "close  to 
normality  subject  to  constraints"  interpretation  for  the  estimated 
density  fv. 


Three  examples  of  predictive  distributions  are  presented.  These  are 
simulated  observations  from  a  normal  distribution,  a  Cauchy  distribution, 
and  some  real  data  from  acoustical  returns  generated  by  biological 
undersea  noise  in  ambient  background  noise. 

Normal  Distribution 

The  following  constraints  were  imposed: 

(1)  Y  is  unimodal  with  the  most  likely  value  for  m  as  5. 

(2)  Pr[— *Y<5]  =0.5. 

(3)  PrI4*Ys61  =  0.6826. 

Figure  1  shows  the  estimated  distribution  for  each  of  the  goal  densities. 
Note  that,  when  the  data  is  truly  normal,  reasonable  estimates  are 
obtained  using  only  the  three  pieces  of  Information  above.  Our  procedure 
can  also  be  used  even  If  there  are  no  data,  but  just  auxiliary 
information  (cf.  Brockett,  Charnes,  and  Paick  [2]). 

Cauchy  Distribution 

To  determine  the  performance  of  this  method  when  the  moment 
constraints  have  been  derived  from  data,  we  generated  1000  Cauchy  random 
numbers  using  the  congruential  generator  RANF  in  the  standard  FORTRAN 
library.  The  following  moment  constraints  were  prescribed  using 
estimates  derived  from  the  data. 

(1)  Y  is  unimodal  with  possible  value  between  -400  and  400. 

(2)  The  most  likely  value  for  m  is  0. 

(3)  The  distribution  of  Y  Is  symmetric  about  0. 

(4)  P[-3.5325*Y$3.5325]  «  0.8. 

(5)  Expected  value  of  VW  a  1.427552. 


Figure  2  shows  the  results  of  the  calculation  using  the  above  information 
for  each  of  the  goal  densities.* 

Distribution  of  Underwater  Acoustic  Returns 

The  ambient  noise  was  recorded  in  a  shallow  water  coastal  region 
which  was  dominated  by  acoustic  energy  from  snapping  shrimp.  30,000 
recorded  samples  were  used  to  obtain  a  distribution  for  these  underwater 
acoustic  returns.  The  following  information  was  obtained  from  the  data: 

1.  Y  is  unimodal  with  possible  values  between  >1024  and  767. 

2.  The  most  likely  value  for  m  is  0. 

3.  The  following  probabilities  are  obtained  from  the  data: 

Pr[-1024sYs0]  =  0.50 
Pr[-1024SYs-7l]  *  0.05 
Pr[-20sY*0]  =  0.25 
Pr[0<Y<20]  *  0.25 
Pr[0sY<7l]  =  0.45 

4.  Expected  value  of  VM  =  5.03,  as  estimated  from  the  data. 
Figure  3  shows  the  predictive  distribution  for  each  of  the  goal  densities 
based  on  this  information. 


*  2 

(Range/4)  is  adopted  as  a  variance  of  the  "normal"  goal  density  g3(x) 
for  illustrative  purposes.  Although  the  variance  of  the  Cauchy 
distribution  does  not  exist,  the  normal  shape  can  still  be  a  permissible 
goal  density. 
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